Clustering and regionalisation

Caution

This course material is currently under construction and is likely incomplete. The final version will be released in October 2023.

This session is all about finding groups of similar observations in data using clustering techniques.

Many questions and topics are complex phenomena that involve several dimensions and are hard to summarise into a single variable. In statistical terms, we call this family of problems multivariate, as opposed to univariate cases where only a single variable is considered in the analysis. Clustering tackles this kind of questions by reducing their dimensionality -the number of relevant variables the analyst needs to look at - and converting it into a more intuitive set of classes that even non-technical audiences can look at and make sense of. For this reason, it is widely used in applied contexts such as policymaking or marketing. In addition, since these methods do not require many preliminary assumptions about the structure of the data, it is a commonly used exploratory tool, as it can quickly give clues about the shape, form and content of a dataset.

The basic idea of statistical clustering is to summarise the information contained in several variables by creating a relatively small number of categories. Each observation in the dataset is then assigned to one, and only one, category depending on its values for the variables originally considered in the classification. If done correctly, the exercise reduces the complexity of a multi-dimensional problem while retaining all the meaningful information contained in the original dataset. This is because once classified, the analyst only needs to look at in which category every observation falls into, instead of considering the multiple values associated with each of the variables and trying to figure out how to put them together in a coherent sense. When the clustering is performed on observations that represent areas, the technique is often called geodemographic analysis.

Although there exist many techniques to statistically group observations in a dataset, all of them are based on the premise of using a set of attributes to define classes or categories of observations that are similar within each of them, but differ between groups. How similarity within groups and dissimilarity between them is defined and how the classification algorithm is operationalised is what makes techniques differ and also what makes each of them particularly well suited for specific problems or types of data.

In the case of analysing spatial data, there is a subset of methods that are of particular interest for many common cases in Spatial Data Science. These are the so-called regionalisation techniques. Regionalisation methods can take also many forms and faces but, at their core, they all involve statistical clustering of observations with the additional constraint that observations need to be geographical neighbors to be in the same category. Because of this, rather than category, we will use the term area for each observation and region for each category, hence regionalization, the construction of regions from smaller areas.

The Python package you will use for clustering today is called scikit-learn and can be imported as sklearn.

import geopandas as gpd
import seaborn as sns
from libpysal import graph
from sklearn import cluster

Attribute-based clustering

In this session, you will be working with another dataset you should already be familiar with - the Scottish Index of Multiple Deprivation. This time, you will focus only on the area of Glasgow City, prepared for this course.

Scottish Index of Multiple Deprivation

As always, the table can be read from the site:

simd = gpd.read_file("data/glasgow_simd_2020.gpkg")

Instead of reading the file directly off the web, it is possible to download it manually, store it on your computer, and read it locally. To do that, you can follow these steps:

  1. Download the file by right-clicking on this link and saving the file
  2. Place the file in the same folder as the notebook where you intend to read it
  3. Replace the code in the cell above with:
simd = gpd.read_file(
    "glasgow_simd_2020.gpkg",
)

Inspect the structure of the table:

simd.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 746 entries, 0 to 745
Data columns (total 52 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   DataZone    746 non-null    object  
 1   DZName      746 non-null    object  
 2   LAName      746 non-null    object  
 3   SAPE2017    746 non-null    int64   
 4   WAPE2017    746 non-null    int64   
 5   Rankv2      746 non-null    int64   
 6   Quintilev2  746 non-null    int64   
 7   Decilev2    746 non-null    int64   
 8   Vigintilv2  746 non-null    int64   
 9   Percentv2   746 non-null    int64   
 10  IncRate     746 non-null    object  
 11  IncNumDep   746 non-null    int64   
 12  IncRankv2   746 non-null    float64 
 13  EmpRate     746 non-null    object  
 14  EmpNumDep   746 non-null    int64   
 15  EmpRank     746 non-null    float64 
 16  HlthCIF     746 non-null    int64   
 17  HlthAlcSR   746 non-null    int64   
 18  HlthDrugSR  746 non-null    int64   
 19  HlthSMR     746 non-null    int64   
 20  HlthDprsPc  746 non-null    object  
 21  HlthLBWTPc  746 non-null    object  
 22  HlthEmergS  746 non-null    int64   
 23  HlthRank    746 non-null    int64   
 24  EduAttend   746 non-null    object  
 25  EduAttain   746 non-null    float64 
 26  EduNoQuals  746 non-null    int64   
 27  EduPartici  746 non-null    object  
 28  EduUniver   746 non-null    object  
 29  EduRank     746 non-null    int64   
 30  GAccPetrol  746 non-null    float64 
 31  GAccDTGP    746 non-null    float64 
 32  GAccDTPost  746 non-null    float64 
 33  GAccDTPsch  746 non-null    float64 
 34  GAccDTSsch  746 non-null    float64 
 35  GAccDTRet   746 non-null    float64 
 36  GAccPTGP    746 non-null    float64 
 37  GAccPTPost  746 non-null    float64 
 38  GAccPTRet   746 non-null    float64 
 39  GAccBrdbnd  746 non-null    object  
 40  GAccRank    746 non-null    int64   
 41  CrimeCount  746 non-null    int64   
 42  CrimeRate   746 non-null    int64   
 43  CrimeRank   746 non-null    float64 
 44  HouseNumOC  746 non-null    int64   
 45  HouseNumNC  746 non-null    int64   
 46  HouseOCrat  746 non-null    object  
 47  HouseNCrat  746 non-null    object  
 48  HouseRank   746 non-null    float64 
 49  Shape_Leng  746 non-null    float64 
 50  Shape_Area  746 non-null    float64 
 51  geometry    746 non-null    geometry
dtypes: float64(16), geometry(1), int64(22), object(13)
memory usage: 303.2+ KB

Before we jump into exploring the data, one additional step that will come in handy down the line. Not every variable in the table is an attribute that we will want for the clustering. In particular, we are interested in sub-ranks based on individual SIMD domains, so we will only consider those. Hence, let us first manually write them so they are easier to subset:

subranks = [
    "IncRankv2",
    "EmpRank",
    "HlthRank",
    "EduRank",
    "GAccRank",
    "CrimeRank",
    "HouseRank"
]

You can quickly familiarise yourself with those variables by plotting a few maps like the one below to build your intuition about what is going to happen.

simd[["IncRankv2", "geometry"]].explore("IncRankv2", tiles="CartoDB Positron", tooltip=False)
Make this Notebook Trusted to load map: File -> Trust Notebook

You can see a decent degree of spatial variation between different sub-ranks. Even though we only have seven variables, it is very hard to “mentally overlay” all of them to come up with an overall assessment of the nature of each part of Glasgow. For bivariate correlations, a useful tool is the correlation matrix plot, available in seaborn:

_ = sns.pairplot(simd[subranks],height=1, plot_kws={"s":1})
/home/runner/micromamba/envs/sds/lib/python3.11/site-packages/seaborn/axisgrid.py:118: UserWarning: The figure layout has changed to tight
  self._figure.tight_layout(*args, **kwargs)

pairplot

This is helpful to consider uni and bivariate questions such as: what is the relationship between the ranks? Is health correlated with income? However, sometimes, this is not enough and we are interested in more sophisticated questions that are truly multivariate and, in these cases, the figure above cannot help us. For example, it is not straightforward to answer questions like: what are the main characteristics of the South of Glasgow? What areas are similar to the core of the city? Are the East and West of Glasgow similar in terms of deprivation levels? For these kinds of multi-dimensional questions -involving multiple variables at the same time- we require a truly multidimensional method like statistical clustering.

K-Means

A cluster analysis involves the classification of the areas that make up a greographical map into groups or categories of observations that are similar within each other but different between them. The classification is carried out using a statistical clustering algorithm that takes as input a set of attributes and returns the group (“labels” in the terminology) each observation belongs to. Depending on the particular algorithm employed, additional parameters, such as the desired number of clusters employed or more advanced tuning parameters (e.g. bandwith, radius, etc.), also need to be entered as inputs. For our classification of SIMD in Glasgow, we will start with one of the most popular clustering algorithms: K-means. This technique only requires as input the observation attributes and the final number of groups that we want it to cluster the observations into. In our case, we will use five to begin with as this will allow us to have a closer look into each of them.

Although the underlying algorithm is not trivial, running K-means in Python is streamlined thanks to scikit-learn. Similar to the extensive set of available algorithms in the library, its computation is a matter of two lines of code. First, we need to specify the parameters in the KMeans method (which is part of scikit-learn’s cluster submodule). Note that, at this point, we do not even need to pass the data:

1kmeans5 = cluster.KMeans(n_clusters=5, random_state=42)
1
n_clusters specifies the number of clusters you want to get and random_state sets the random generator to a known state, ensuring that the result is always the same.

This sets up an object that holds all the parameters required to run the algorithm. To actually run the algorithm on the attributes, we need to call the fit method in kmeans5:

1kmeans5.fit(simd[subranks])
1
fit() takes an array of data, therefore pass the columns of simd with sub-ranks and run the clustering algorithm on that.
/home/runner/micromamba/envs/sds/lib/python3.11/site-packages/sklearn/cluster/_kmeans.py:1412: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
  super()._check_params_vs_input(X, default_n_init=10)
KMeans(n_clusters=5, random_state=42)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

The kmeans5 object now contains several components that can be useful for an analysis. For now, we will use the labels, which represent the different categories in which we have grouped the data. Remember, in Python, life starts at zero, so the group labels go from zero to four. Labels can be extracted as follows:

kmeans5.labels_
array([1, 2, 2, 4, 2, 1, 0, 4, 4, 0, 0, 1, 1, 1, 1, 1, 1, 2, 0, 0, 2, 2,
       0, 0, 0, 0, 0, 0, 2, 0, 0, 2, 4, 2, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0,
       0, 0, 0, 0, 2, 4, 4, 0, 2, 2, 2, 0, 4, 2, 4, 0, 1, 4, 4, 4, 4, 0,
       0, 0, 2, 0, 2, 0, 4, 4, 4, 0, 0, 0, 4, 0, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 4, 0, 0, 0, 0, 4, 4, 4, 4, 0, 2, 2, 2, 2, 2, 4, 4, 4,
       4, 4, 4, 2, 4, 4, 0, 2, 2, 1, 4, 1, 4, 4, 4, 4, 4, 4, 4, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 4, 2, 4, 3, 3, 2, 4, 4, 2, 4, 2, 3, 4, 3, 3,
       4, 1, 1, 3, 1, 3, 1, 4, 3, 4, 3, 3, 4, 3, 3, 4, 1, 1, 3, 3, 3, 2,
       3, 2, 2, 4, 0, 0, 2, 1, 0, 2, 4, 0, 0, 4, 0, 4, 1, 1, 1, 1, 4, 4,
       4, 1, 1, 1, 1, 4, 4, 1, 3, 1, 1, 1, 4, 1, 2, 0, 4, 4, 2, 2, 0, 2,
       0, 2, 2, 0, 0, 0, 0, 0, 2, 2, 2, 4, 4, 4, 2, 4, 4, 4, 4, 4, 1, 1,
       4, 4, 4, 1, 1, 4, 4, 3, 4, 4, 1, 2, 4, 4, 4, 4, 4, 2, 2, 2, 4, 0,
       0, 0, 2, 2, 2, 4, 2, 0, 4, 4, 0, 2, 2, 4, 3, 2, 0, 4, 4, 0, 4, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 2, 2, 2, 2, 2, 2, 4, 2, 2, 2, 2,
       2, 2, 4, 4, 1, 4, 4, 4, 4, 4, 4, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1,
       1, 0, 4, 1, 0, 4, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 0, 0, 0, 0, 0,
       2, 2, 2, 2, 2, 4, 4, 1, 0, 0, 0, 0, 2, 0, 4, 0, 0, 0, 0, 0, 0, 1,
       2, 2, 2, 2, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 2, 2, 2, 0,
       2, 2, 2, 0, 2, 2, 0, 0, 0, 0, 2, 0, 0, 0, 2, 0, 0, 2, 2, 4, 4, 0,
       4, 2, 0, 0, 4, 0, 4, 0, 0, 0, 1, 1, 4, 1, 1, 1, 1, 1, 0, 0, 2, 2,
       2, 0, 2, 4, 2, 1, 0, 2, 1, 2, 2, 2, 4, 2, 2, 2, 4, 2, 2, 2, 2, 4,
       3, 2, 2, 0, 2, 2, 3, 1, 0, 0, 2, 0, 0, 0, 4, 4, 4, 4, 4, 2, 4, 4,
       0, 2, 2, 0, 0, 0, 4, 4, 4, 3, 4, 2, 2, 2, 2, 4, 3, 0, 4, 4, 0, 3,
       0, 3, 3, 4, 4, 4, 0, 4, 0, 3, 3, 3, 3, 2, 2, 2, 3, 3, 0, 3, 3, 4,
       3, 3, 3, 3, 4, 3, 3, 4, 3, 3, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 4, 3,
       3, 2, 2, 2, 0, 2, 2, 4, 0, 4, 4, 4, 2, 2, 3, 2, 0, 0, 2, 0, 0, 0,
       0, 2, 0, 0, 0, 0, 2, 2, 2, 2, 0, 4, 4, 1, 0, 1, 4, 0, 4, 2, 0, 0,
       0, 0, 0, 0, 0, 0, 2, 2, 2, 0, 2, 1, 1, 1, 1, 1, 1, 4, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 4, 3, 3, 4, 2, 3, 4, 2, 4, 2, 3, 4, 4, 3, 4, 2, 4,
       4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 1, 3, 1, 3, 1, 1, 2, 1, 1,
       3, 1, 1, 1, 4, 2, 2, 2, 2, 2, 1, 1, 2, 1, 2, 2, 2, 2, 2, 4, 2, 2,
       2, 4, 2, 4, 2, 2, 4, 2, 4, 2, 0, 2, 4, 2, 4, 0, 4, 4, 2, 4, 1, 2,
       2, 4, 4, 1, 1, 1, 1, 2, 2, 4, 4, 4, 4, 1, 4, 4, 0, 4, 4, 0, 4, 0,
       0, 4, 1, 4, 2, 0, 0, 0, 2, 2, 2, 0, 2, 0, 2, 2, 2, 2, 4, 2],
      dtype=int32)

Each number represents a different category, so two observations with the same number belong to same group. The labels are returned in the same order as the input attributes were passed in, which means we can append them to the original table of data as an additional column:

simd["kmeans_5"] = kmeans5.labels_
simd["kmeans_5"].head()
0    1
1    2
2    2
3    4
4    2
Name: kmeans_5, dtype: int32

To get a better understanding of the classification we have just performed, it is useful to display the categories created on a map. For this, we will use a unique values choropleth, which will automatically assign a different color to each category:

simd[["kmeans_5", 'geometry']].explore("kmeans_5", categorical=True, tiles="CartoDB Positron")
Make this Notebook Trusted to load map: File -> Trust Notebook

The map above represents the geographical distribution of the five categories created by the K-means algorithm. A quick glance shows a strong spatial structure in the distribution of the colours: group 3 (grey) is mostly in central areas and towards the west, group 1 (green) covers peripheries and so on, but not all clusters are equally represented.

Exploring the nature of the categories

Once we have a sense of where and how the categories are distributed over space, it is also useful to explore them statistically. This will allow us to characterize them, giving us an idea of the kind of observations subsumed into each of them. As a first step, let us find how many observations are in each category. To do that, we will make use of the groupby operator introduced before, combined with the function size, which returns the number of elements in a subgroup:

k5sizes = simd.groupby('kmeans_5').size()
k5sizes
kmeans_5
0    157
1    102
2    233
3     74
4    180
dtype: int64

The groupby operator groups a table (DataFrame) using the values in the column provided (kmeans_5) and passes them onto the function provided aftwerards, which in this case is size. Effectively, what this does is to groupby the observations by the categories created and count how many of them each contains. For a more visual representation of the output, a bar plot is a good alternative:

_ = k5sizes.plot.bar()

As we suspected from the map, groups vary in sizes, with group 2 having over 200 observations, groups 0, 1 and 4 over 100 and a group 3 having 74 observations.

In order to describe the nature of each category, we can look at the values of each of the attributes we have used to create them in the first place. Remember we used the sub-ranks on many aspects of deprivation to create the classification, so we can begin by checking the average value of each. To do that in Python, we will rely on the groupby operator which we will combine it with the function mean:

1k5_means = simd.groupby('kmeans_5')[subranks].mean()
2k5_means.T
1
Use groupby to calculate mean per each sub-rank.
2
Transpose the table so it is not too wide
kmeans_5 0 1 2 3 4
IncRankv2 739.369427 5023.578431 595.560086 5291.804054 2722.902778
EmpRank 810.990446 5216.225490 711.864807 5758.695946 3116.936111
HlthRank 653.675159 4821.480392 631.545064 5763.851351 2725.005556
EduRank 1078.439490 5233.803922 1082.309013 5519.391892 2983.811111
GAccRank 3249.101911 3429.696078 5851.716738 5422.851351 5212.127778
CrimeRank 1696.554140 4542.950980 1291.233906 2841.182432 2732.405556
HouseRank 826.289809 3796.156863 551.736052 1079.175676 1187.669444

When interpreting the values, remember that lower value represents higher deprivation. While the results seem plausible and there are ways of interpreting them, we haven’t used any spatial methods.

Spatially-lagged clustering

  • K-Means + lag
queen = graph.Graph.build_contiguity(simd)
queen_row = queen.transform("R")
for column in subranks:
    simd[column + "_lag"] = queen_row.lag(simd[column])
simd.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 746 entries, 0 to 745
Data columns (total 60 columns):
 #   Column         Non-Null Count  Dtype   
---  ------         --------------  -----   
 0   DataZone       746 non-null    object  
 1   DZName         746 non-null    object  
 2   LAName         746 non-null    object  
 3   SAPE2017       746 non-null    int64   
 4   WAPE2017       746 non-null    int64   
 5   Rankv2         746 non-null    int64   
 6   Quintilev2     746 non-null    int64   
 7   Decilev2       746 non-null    int64   
 8   Vigintilv2     746 non-null    int64   
 9   Percentv2      746 non-null    int64   
 10  IncRate        746 non-null    object  
 11  IncNumDep      746 non-null    int64   
 12  IncRankv2      746 non-null    float64 
 13  EmpRate        746 non-null    object  
 14  EmpNumDep      746 non-null    int64   
 15  EmpRank        746 non-null    float64 
 16  HlthCIF        746 non-null    int64   
 17  HlthAlcSR      746 non-null    int64   
 18  HlthDrugSR     746 non-null    int64   
 19  HlthSMR        746 non-null    int64   
 20  HlthDprsPc     746 non-null    object  
 21  HlthLBWTPc     746 non-null    object  
 22  HlthEmergS     746 non-null    int64   
 23  HlthRank       746 non-null    int64   
 24  EduAttend      746 non-null    object  
 25  EduAttain      746 non-null    float64 
 26  EduNoQuals     746 non-null    int64   
 27  EduPartici     746 non-null    object  
 28  EduUniver      746 non-null    object  
 29  EduRank        746 non-null    int64   
 30  GAccPetrol     746 non-null    float64 
 31  GAccDTGP       746 non-null    float64 
 32  GAccDTPost     746 non-null    float64 
 33  GAccDTPsch     746 non-null    float64 
 34  GAccDTSsch     746 non-null    float64 
 35  GAccDTRet      746 non-null    float64 
 36  GAccPTGP       746 non-null    float64 
 37  GAccPTPost     746 non-null    float64 
 38  GAccPTRet      746 non-null    float64 
 39  GAccBrdbnd     746 non-null    object  
 40  GAccRank       746 non-null    int64   
 41  CrimeCount     746 non-null    int64   
 42  CrimeRate      746 non-null    int64   
 43  CrimeRank      746 non-null    float64 
 44  HouseNumOC     746 non-null    int64   
 45  HouseNumNC     746 non-null    int64   
 46  HouseOCrat     746 non-null    object  
 47  HouseNCrat     746 non-null    object  
 48  HouseRank      746 non-null    float64 
 49  Shape_Leng     746 non-null    float64 
 50  Shape_Area     746 non-null    float64 
 51  geometry       746 non-null    geometry
 52  kmeans_5       746 non-null    int32   
 53  IncRankv2_lag  746 non-null    float64 
 54  EmpRank_lag    746 non-null    float64 
 55  HlthRank_lag   746 non-null    float64 
 56  EduRank_lag    746 non-null    float64 
 57  GAccRank_lag   746 non-null    float64 
 58  CrimeRank_lag  746 non-null    float64 
 59  HouseRank_lag  746 non-null    float64 
dtypes: float64(23), geometry(1), int32(1), int64(22), object(13)
memory usage: 346.9+ KB
subranks_lag = [column + "_lag" for column in subranks]
subranks_lag
['IncRankv2_lag',
 'EmpRank_lag',
 'HlthRank_lag',
 'EduRank_lag',
 'GAccRank_lag',
 'CrimeRank_lag',
 'HouseRank_lag']
subranks_spatial = subranks + subranks_lag
subranks_spatial
['IncRankv2',
 'EmpRank',
 'HlthRank',
 'EduRank',
 'GAccRank',
 'CrimeRank',
 'HouseRank',
 'IncRankv2_lag',
 'EmpRank_lag',
 'HlthRank_lag',
 'EduRank_lag',
 'GAccRank_lag',
 'CrimeRank_lag',
 'HouseRank_lag']
kmeans5_lag = cluster.KMeans(n_clusters=5, random_state=42)
kmeans5_lag.fit(simd[subranks_spatial])
/home/runner/micromamba/envs/sds/lib/python3.11/site-packages/sklearn/cluster/_kmeans.py:1412: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
  super()._check_params_vs_input(X, default_n_init=10)
KMeans(n_clusters=5, random_state=42)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
simd["kmeans_5_lagged"] = kmeans5_lag.labels_
simd[["kmeans_5_lagged", 'geometry']].explore("kmeans_5_lagged", categorical=True, tiles="CartoDB Positron")
Make this Notebook Trusted to load map: File -> Trust Notebook

Spatially-constrained clustering (regionalisation)

  • Agglomerative - weights constraint
agg5 = cluster.AgglomerativeClustering(n_clusters=5, connectivity=queen.sparse)
agg5.fit(simd[subranks])
/home/runner/micromamba/envs/sds/lib/python3.11/site-packages/sklearn/cluster/_agglomerative.py:303: UserWarning: the number of connected components of the connectivity matrix is 2 > 1. Completing it to avoid stopping the tree early.
  connectivity, n_connected_components = _fix_connectivity(
AgglomerativeClustering(connectivity=<746x746 sparse array of type '<class 'numpy.int64'>'
    with 4126 stored elements in COOrdinate format>,
                        n_clusters=5)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
simd["agg_5_lagged"] = agg5.labels_
simd[["agg_5_lagged", 'geometry']].explore("agg_5_lagged", categorical=True, tiles="CartoDB Positron")
Make this Notebook Trusted to load map: File -> Trust Notebook
simd_regions = simd[["agg_5_lagged", "geometry"]].dissolve("agg_5_lagged")
simd_regions
geometry
agg_5_lagged
0 MULTIPOLYGON (((259347.187 665724.819, 259330....
1 POLYGON ((253798.600 657973.600, 253742.300 65...
2 MULTIPOLYGON (((251902.559 667950.621, 251901....
3 POLYGON ((254848.289 667194.445, 254815.530 66...
4 POLYGON ((257119.300 659900.400, 257112.887 65...
simd_regions.reset_index().explore("agg_5_lagged", categorical=True, tiles="CartoDB Positron")
Make this Notebook Trusted to load map: File -> Trust Notebook